converge uniformly
Sample and Map from a Single Convex Potential: Generation using Conjugate Moment Measures
The canonical approach in generative modeling is to split model fitting into two blocks: define first how to sample noise (e.g. Gaussian) and choose next what to do with it (e.g. using a single map or flows). We explore in this work an alternative route that ties sampling and mapping. We find inspiration in moment measures [Cordero-Erausquin and Klartag, 2015], a result that states that for any measure ฯ, there exists a unique convex potential usuch that ฯ = u e u. While this does seem to tie effectively sampling (from log-concave distribution e u) and action (pushing particles through u), we observe on simple examples (e.g., Gaussians or 1D distributions) that this choice is ill-suited for practical tasks. We study an alternative factorization, where ฯ is factorized as w e w, where w is the convex conjugate of a convex potential w. We call this approach conjugate moment measures, and show far more intuitive results on these examples. Because w is the Monge map between the log-concave distribution e w and ฯ, we rely on optimal transport solvers to propose an algorithm to recover w from samples of ฯ, and parameterize w as an input-convex neural network. We also address the common sampling scenario in which the density of ฯ is known only up to a normalizing constant, and propose an algorithm to learn w in this setting.
Sample and Map from a Single Convex Potential: Generation using Conjugate Moment Measures
Vesseron, Nina, Bรฉthune, Louis, Cuturi, Marco
A common approach to generative modeling is to split model-fitting into two blocks: define first how to sample noise (e.g. Gaussian) and choose next what to do with it (e.g. using a single map or flows). We explore in this work an alternative route that ties sampling and mapping. We find inspiration in moment measures, a result that states that for any measure $\rho$ supported on a compact convex set of $\mathbb{R}^d$, there exists a unique convex potential $u$ such that $\rho=\nabla u\,\sharp\,e^{-u}$. While this does seem to tie effectively sampling (from log-concave distribution $e^{-u}$) and action (pushing particles through $\nabla u$), we observe on simple examples (e.g., Gaussians or 1D distributions) that this choice is ill-suited for practical tasks. We study an alternative factorization, where $\rho$ is factorized as $\nabla w^*\,\sharp\,e^{-w}$, where $w^*$ is the convex conjugate of $w$. We call this approach conjugate moment measures, and show far more intuitive results on these examples. Because $\nabla w^*$ is the Monge map between the log-concave distribution $e^{-w}$ and $\rho$, we rely on optimal transport solvers to propose an algorithm to recover $w$ from samples of $\rho$, and parameterize $w$ as an input-convex neural network.
Differentially Private Conditional Independence Testing
Kalemaj, Iden, Kasiviswanathan, Shiva Prasad, Ramdas, Aaditya
Conditional independence (CI) tests are widely used in statistical data analysis, e.g., they are the building block of many algorithms for causal graph discovery. The goal of a CI test is to accept or reject the null hypothesis that $X \perp \!\!\! \perp Y \mid Z$, where $X \in \mathbb{R}, Y \in \mathbb{R}, Z \in \mathbb{R}^d$. In this work, we investigate conditional independence testing under the constraint of differential privacy. We design two private CI testing procedures: one based on the generalized covariance measure of Shah and Peters (2020) and another based on the conditional randomization test of Cand\`es et al. (2016) (under the model-X assumption). We provide theoretical guarantees on the performance of our tests and validate them empirically. These are the first private CI tests with rigorous theoretical guarantees that work for the general case when $Z$ is continuous.
Implicit regularization of deep residual networks towards neural ODEs
Marion, Pierre, Wu, Yu-Han, Sander, Michael E., Biau, Gรฉrard
Residual neural networks are state-of-the-art deep learning models. Their continuous-depth analog, neural ordinary differential equations (ODEs), are also widely used. Despite their success, the link between the discrete and continuous models still lacks a solid mathematical foundation. In this article, we take a step in this direction by establishing an implicit regularization of deep residual networks towards neural ODEs, for nonlinear networks trained with gradient flow. We prove that if the network is initialized as a discretization of a neural ODE, then such a discretization holds throughout training. Our results are valid for a finite training time, and also as the training time tends to infinity provided that the network satisfies a Polyak-Lojasiewicz condition. Importantly, this condition holds for a family of residual networks where the residuals are two-layer perceptrons with an overparameterization in width that is only linear, and implies the convergence of gradient flow to a global minimum. Numerical experiments illustrate our results.
Uniform Convergence of Deep Neural Networks with Lipschitz Continuous Activation Functions and Variable Widths
We consider deep neural networks with a Lipschitz continuous activation function and with weight matrices of variable widths. We establish a uniform convergence analysis framework in which sufficient conditions on weight matrices and bias vectors together with the Lipschitz constant are provided to ensure uniform convergence of the deep neural networks to a meaningful function as the number of their layers tends to infinity. In the framework, special results on uniform convergence of deep neural networks with a fixed width, bounded widths and unbounded widths are presented. In particular, as convolutional neural networks are special deep neural networks with weight matrices of increasing widths, we put forward conditions on the mask sequence which lead to uniform convergence of resulting convolutional neural networks. The Lipschitz continuity assumption on the activation functions allows us to include in our theory most of commonly used activation functions in applications.
Matching for causal effects via multimarginal optimal transport
Gunsilius, Florian, Xu, Yuliang
Identifying cause and effect is one of the primary goals of scientific research. The leading approaches to uncover causal effects are randomized controlled trials. Unfortunately, such trials are often practically infeasible on ethical grounds, might not be generalizable beyond the experimental setting due to lack of variation in the population, or simply have too few participants to generate robust results due to financial or logistical restrictions. An attractive alternative is to use observational data, which are ubiquitous, often readily available, and comprehensive. The main challenge in using observational data for causal inference is the fact that assignment into treatment is not perfectly randomized. This implies that individuals assigned to different treatments may possess systematically different observable and unobservable covariates. Comparing the outcomes between individuals in different treatment groups may then yield a systematically biased estimator of the true causal effect. Matching methods are designed to balance the treatment samples in such a way that differences between the observed covariates of the groups are minimized. This allows the researcher to directly compare the balanced treatment groups for estimating the true causal effect under the assumption that the unobservable covariates of individuals are similar if their observed covariates are similar.
A Function Fitting Method
In this article we present a function fitting method, which is a convex minimization problem and can be solved using a gradient descent algorithm. We also provide some analysis on the fitness of the function to the data. The function fitting problem is also shown to be a solution of a linear, weak pde which contains some global terms. We describe a simple numerical solution using a gradient descent algorithm, that converges uniformly to the actual solution.As the minimization problem is also that of a quadratic form, there also exists a numerical method using linear algebra.